Japanese Morphological Analyzer using Word Co-occurence -JTAG
نویسندگان
چکیده
We developed a Japanese morphological analyzer that uses the co-occurrence of words to select the correct sequence of words in an unsegmented Japanese sentence. The co-occurrence information can be obtained from cases where the system incorrectly analyzes sentences. As the amount of information increases, the accuracy of the system increases with a small risk of degradation. Experimental results show that the proposed system assigns the correct phonological representations to unsegmented Japanese sentences more precisely than do other popular systems.
منابع مشابه
A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives
We present an open source morphological analyzer for Japanese nouns, verbs and adjectives. The system builds upon the morphological analyzing capabilities of MeCab [Matsumoto et al., 1999] to incorporate finer details of classification such as politeness, tense, mood and voice attributes. We implemented our analyzer in the form of a finite state transducer using the open source finite state com...
متن کاملThe Mega-Word Tagged-Corpus Project
Large corpora with part-of-speech tagging play a very important role in recent statisticsbased and example-based natural language processing systems. However, no such corpora have become widely available for Japanese so far. Because the Japanese language has no explicit word boundaries, it is impossible even to count words without a corpus that has at. least word segmentations. This paper descr...
متن کاملHMM Parameter Learning for Japanese Morphological Analyzer
This paper presents a method to apply Hidden Markov Model (HMM) to parameter learning for Japanese morphological analyzer. We especially emphasize how the following two information sources affect the results of the parameter learning: 1) The initial value of parameters, i.e., the initial probabilities and 2) some grammatical constraints that hold in Japanese sentences independently of any domai...
متن کاملMostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji
Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and grammar or on pre-segmented data. In contrast, we introduce a novel statistical method utilizing unsegmented training data, with performance on kanji sequences comparable to and s...
متن کاملExample-Based Correction of Word Segmentation and Part of Speech Labelling
This paper describes an example-based correction component for Japanese word segmentation and part of speech labelling (AMED), and a way of combining it with a pre-existing rule-based Japanese morphological analyzer and a probabilistic part of speech tagger. Statistical algorithms rely on frequency of phenomena or events in corpora; however, low frequency events are often inadequately represent...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998